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Abstract. In two previous papers [AGM02, AGM08] we computed cohomol- 
ogy groups H^{To{N); C) for a range of levels N, where To{N) is the congru- 
ence subgroup of SL4 (Z) consisting of all matrices with bottom row congruent 
to (0, 0, 0, *) mod N. In this note we update this earlier work by carrying 
it out for prime levels up to A'^ = 211. This requires new methods in sparse 
matrix reduction, which are the main focus of the paper. Our computations 
involve matrices with up to 20 million nonzero entries. We also make two 
conjectures concerning the contributions to (ToiN); C) for N prime coming 
from Eisenstein series and Siegcl modular forms. 



1. Introduction 

In two previous papers [AGM02, AGM08] we computed the cohomology in de- 
gree 5 of congruence subgroups To{N) C SL4(Z) with trivial C coefficients, where 
ro(iV) is the subgroup of SL4(Z) consisting of all matrices with bottom row con- 
gruent to (0, 0, 0, *) mod N. The highest level we reached was N = 83. We also 
computed some Hecke operators on these cohomology groups and identified the 
cohomology as either cuspidal or as coming from the boundary (Eisensteinian). 

In this paper we concentrate on explaining new techniques we have developed to 
reduce very large sparse matrices. These techniques have enabled us to carry out 
our computations for much higher values of the level N. We explain in Section [2] 
that our algorithms differ from others in the literature because we must compute 
change of basis matrices. As an oversimplified illustration, imagine solving Ax. = b 
for an invcrtible matrix A. Classical dense methods produce an invertible change 
of basis matrix P where PA has a desirable form, and we solve for x by computing 
Ph. When A is large and sparse, computing P is much too expensive if finding x 
is our only goal. Iterative methods like Wiedemann's produce x more simply. In 
this paper, however, we compute explicit cocycles in our cohomology groups, and 
compute their images under Hecke operators. As explained in (|2.2p . change of basis 
matrices are essential for this task. (Sec Section [2] for references. The illustration 
is oversimplified because the actual A have less than full rank and are not even 
square.) 
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The linear algebra issues are compounded when computing the Smith normal 
form (SNF) of integer matrices. Modern SNF methods reduce mod p for a range 
of p, solve at each p by iterative methods, and lift to a final result over Z by Chinese 
remaindering techniques. It seems unknown, though, how to find the SNF change 
of basis matrices by Chinese remaindering. Hence we use a different approach (|2.ip . 
Sec p.3.6|) for a comparison of times. Although iterative methods can be efficiently 
parallelized, this paper docs not use parallel techniques. 

In future installments of this project we will look at torsion classes in H^(rQ{N); Z) 
as well as twisted coefficient modules. For the torsion classes, we will test Conjec- 
ture B of [Ash92] that asserts that they have attached Galois representations. The 
new sparse matrix techniques discussed here will be of great importance in carrying 
this project forward. 

In the paper [vGvdKTV97] of Van Geemen, Van der Kallcn, Top, and Verberk- 
moes, cohomology classes for GL(3) were found by working modulo small primes 
and using the LLL algorithm to reconstruct integral solutions. This is a useful 
method that various people (including ourselves in the past) have followed. How- 
ever, in this paper we work solely modulo a 5 digit prime p without lifting to Z. 
Lifting to Z would be a huge computatonal effort at larger levels. The prime p 
is small enough to make computation fast, and large enough to make us morally 
certain that we are actually finding the complex betti numbers and Hecke eigenval- 
ues. The fact that all our data is accounted for by Eisenstein series and liftings of 
automorphic forms confirms this. 

We continue to find that the cuspidal part consists of functorial lifts of Siegel 
modular forms from paramodular subgroups of Sp4((Q)) that arc not Gritsenko lifts, 
as described in [AGM08] for levels N — 61, 73, 79. Wc conjecture that these func- 
torial lifts will always occur, at least for prime level, in Conjecture [2] of Section ID 
These lifted forms correspond to selfdual automorphic representations on GL(4)/(Q. 
We were hoping to find non-lifted cuspidal cohomology classes, which would corre- 
spond to non-selfdual automorphic representations. Unfortunately, wc found none. 
We see no reason why they should not exist for larger N, but no one has proven 
their existence. It should be noted that non-sclfdual Galois representations, that 
by Langiands' philosophy would predict non-selfdual automorphic representations 
for GL(4) of the type we are searching for, were constructed by J. Scholten [Sch02]. 

Our data for the boundary cohomology for prime level leads to our Conjecture [T] 
of Section [5] that identifies its constituents as various Eiscnsteinian lifts of certain 
classical cuspforms of weights 2 and 4, and of certain cuspidal cohomology classes 
for GL(3)/Q. This conjecture is a refinement of Conjecture 1 of [AGM08]. 

We ought to have similar conjectures for composite level, but we don't have 
enough data to justify an attempt to make them. The size of the matrices and the 
complexity of computing the Hcckc operators increases as N grows larger or more 
composite. Therefore at a certain point we stopped computing for composite N 
but were able to continue for prime N up to level 211. Similarly the size of the 
computation of the Hecke operators at a prime / increases dramatically with I, so 
that in fact for the new levels in this paper, we compute the Hecke operators at 
most for I = 2. 

The index of ro(A^) in SL4(Z) grows like 0{N^). Thus the matrices we need to 
reduce are on the order of x N^. This growth in size is what makes this com- 
putational project so much more difficult to carry out for large N, compared to the 
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coliomology of congruence subgroups of SL2(Z), SL3(Z), Sp4(Z), and other lower 
rank groups. The imphed constant in the 0{N^) is larger when N is composite, 
which is why wc eventually had to restrict ourselves to prime A^. Also the Hecke 
operator computations become much more lengthy for composite N. 

Please refer to [AGM02] for explanations of why we look in degree 5, the de- 
composition of the cohomology into the cuspidal part and the part coming from 
the Borel-Serre boundary, and how we perform the computations. We also review 
there the number theoretic reasons for being interested in this cohomology, primar- 
ily because of connections to Galois representations and motives. In [AGM08] the 
reader will find how we identified the cuspidal part of the cohomology as lifted from 
GSp(4)/Q and why this must be so for selfdual classes of prime level. 

In [AGM02] we explained that the well-rounded retract W of the symmetric space 
for SL4(R) is a contractible cell complex on which SL4(Z) acts cellular ly with finite 
stabilizers of orders divisible only by 2, 3 and 5, and that W has only finitely many 
cells modulo SL4(Z). Therefore we can use the chain complex C* (M^/ro(A^); F) to 
compute -ff*(ro(A^); F) for any field F of characteristic not equal to 2, 3 or 5. In 
practice wc substitute a large finite field for C as justified in [AGM02]. 

Also in [AGM02] we described explicitly and in detail how to handle the data 
structures needed to construct the chain complex from W/ro{N) and hence to 
create the matrices whose nullspaces modulo column spaces are isomorphic to the 
cohomology. 

In this paper we continue to use this set-up. The new thing here is the method 
explained in Section [21 which enables us to take N as far as 211. 

In Section [3] we give the background on Eisenstein cohomology and Siegel mod- 
ular forms needed to present our computational results and to formulate our con- 
jectures. Finally, in Section [4] we state two conjectures about the structure of 
H^{ro{N);C) for N prime, give the results of our computations of _ff'^(ro(A^); C) 
for N prime, 83 < iV < 211, and verify the two conjectures in this range. The first. 
Conjecture [U improves on [AGM02, Conjecture 1] by fixing the weight 4 part of 
the Eisensteinian cohomology to those weight 4 cuspforms / whose central special 
value vanishes. We also feel confident now of conjecturing that our list of classes 
for the boundary cohomology is complete in this case. The second, Conjecture [2l 
states exactly which cuspidal paramodular Siegel forms at prime level show up in 
the cuspidal cohomology. 

2. Computational Methods 
Our problem is to find H'^ of a complex of free i?-modules for some ring R, 

(1) < — ^ ^ C"^ < 

Let rii = rank C^ View the as a space of column vectors, and represent the as 
matrices. All the matrix entries lie in Z. It is possible to carry out our computations 
over i? = Z, obtaining the torsion in the cohomology along with the free part. Our 
next paper (IV in this series) will study the torsion. In the present paper, we work 
over a field. However, we will intersperse remarks on the computational problem 
for more general R, with an eye to future papers in the series. 

In principle we want to work over i? = C, because our purpose is to study 
automorphic forms. To avoid round-off error and numerical instability, however, 
we replace C with a finite field ¥p of prime order p. If p is large enough, it is 
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extremely likely that dimi7'(ro(iV); Fp) will equal dim H^{To{N);C) for the N we 
consider, and that Hecke computations will work compatibly. We generally use 
p = 12379; the fourth prime after 12345. We give details about the choice of p 
in ((2X3)) . 

The matrices arc sparse matrices, meaning only a small fraction of the entries 
in each row and column are nonzero. Our largest matrices, those for d^, have 
dimension 715 x 714 w x 25N^/72 for prime N. However, at most 6 of the 

entries in each column are nonzero, and at most 26 in each row. The 6 and 26 are 
independent of N. The matrices for have dimension x 715 w A^'^/96 x 
for prime N. All these estimates are a few times larger for composite N. We give 
more precise estimates for the Ui in ()2.1.4p . The relative sizes are shown in Figure[T] 
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Figure 1. Relative sizes of the matrices d^, c?^ in our cochain 
complex. We will exploit the fact that is wider than its height. 

Given an m x n sparse matrix A, our fundamental computational task is to find 
the Smith normal form (SNF) decomposition 

(2) A = PDQ. 

Here P G GL„i(i?), Q G GL„(i?), D is zero except on the diagonal, and the nonzero 
diagonal entries satisfy da \ di+i^i+i for all i. There is a p with da ^ for < i < p 
and da = for i> p; when i? is a field, p is the rank of A. We call P and Q change 
of basis matrices. 

To carry out the calculations, we used Sheafhom 2.2, a free software pack- 
age written by the third author [McCb, McCa]. Sheafhom performs large-scale 
computations in the category of finitely-generated i?-modulcs, where R is any 
principal ideal domain supporting exact computation. Most development work 
in Sheafhom has been for domains that are not fields, especially Z and other 
rings of integers of class number 1. In this sense it differs from most of the sparse 
matrix hterature, which looks at R and C [DER89,GVL96,Dav06,Mat03] or finite 
fields [Wie86, L091, PS92, Tei98]. The differences are because we need to com- 
pute P and Q, as explained in the introduction. For matrices over Z, one can find 
the SNF D matrix efficiently by reducing modulo a number of primes [DSVOl] [DE- 
VGU07], or by other techniques [HHR93] [BCP97]. Yet it is not clear how to find P 
and Q by reducing by multiple primes. The need for the change of basis matrices 
is why Sheafhom aims to work globally. 

Fill-in is a concern in sparse linear algebra over any ring R. Imagine two vectors 
that both have a nonzero at io- Add a scalar multiple of the first to the second in 
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order to zero out the value at io. In general, the result will have nonzeros in the 
union of the positions where the originals had nonzeros, apart from io. (We follow 
the standard convention in the sparse matrix literature and abbreviate "nonzero 
entries" to "nonzeros." ) For very sparse vectors, the number of nonzeros almost 
doubles. Fill-in is this growth in the number of nonzeros. 

A separate issue when R is infinite is integer explosion. Over Z, the length 
(number of digits) of the product of two numbers is roughly the sum of the lengths 
of the factors. A vector operation that zeroes out one position will tend to increase 
the length of the numbers in the rest of the vector. Sheafhom's purpose is to avoid 
fill-in and integer explosion as much as possible. With R = Fp, integer explosion is 
not an issue, and the focus is on avoiding fill-in. 

We find the SNF by performing a sequence of elementary operations: permuting 
two columns, adding a scalar multiple of one column to another, and multiplying a 
column by a unit of R, plus the same for rows. The algorithm is described in (|2.1.2[1 . 

Algorithms that operate on only one side of the matrix arc more familiar. These 
include the Hermite normal form (HNF) A ^ HQ [Coh93, (2.4.2)]. Over a field, 
HNF is the same as Gaussian elimination on columns, with H in column-echelon 
form and Q £ GL„(Z). 

In principle, we prefer SNF to HNF because we are working with cochain com- 
plexes. To evaluate Hecke operators on we need to compute with the map 
ker (f (ker <f)/{im d*^^) that reduces cocycles modulo coboundaries. This re- 
quires the P matrix of rf'^^ and the Q matrix of o?\ When computing all of H*, it 
is natural to compute P and Q for d* at the same time. 

When the are very large, however, we must compromise by omitting computa- 
tion of the change-of-basis matrices that are not needed. Since this paper is about 
H^, we compute for only D and Q, and for only P and D. The biggest savings 
arise because the largest matrices, d^, arc significantly wider than their height, as 
Figure [1] shows. The Q matrices for d*, those on the longer side, are fortunately 
the ones we can forget. 

HNF does have one advantange over SNF when one is forgetting the Q matrix: it 
can be computed by the following disk HNF algorithm. Write the whole matrix A to 
disk, then read it back in one column at a time. As one reads each column, one puts 
the matrix accumulated so far into HNF. Over R = ¥p, this means using standard 
Gaussian elimination on columns, with no special pivoting algorithm. Again, the 
savings arise because d"* has a short side and a long side. The echelon form never 
exceeds x n^, the square on the short side. 

With the wrong matrices, though, disk HNF is a recipe for disaster. It can 
succeed only if the matrix has low co-rank. The co-rank of an to x n matrix is 
min(TO, n) minus the rank p of the matrix. Assume m < n from now on (this holds 
for our and d*), so the co-rank is to — p. Imagine that, after some work, one 
must put such a matrix into column-echelon form using Gaussian elimination. We 
claim that the echelon form will have a great deal of fill-in, no matter how cleverly 
the pivots are chosen. The echelon form will have p pivot rows with only a single 
nonzero. The remaining m — p rows will in general be dense — no pivot has had 
a chance to clear out their entries, and by the law of averages they will mostly 
be nonzero. Hence there are about (m — p) ■ p — (co-rank) • (rank) nonzeros in 
the echelon matrix. We cannot stop p from being large. But when m ~ p is also 
large, the product (to — p) ■ p is prohibitively large. These observations are for 
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the final result of a computation; usually fill-in is even worse in the middle of the 
computation, before all the pivot rows have been established. 

The main technical observation of this paper is to use the change of basis matrices 
in a simple way to transform c?^ into an equivalent matrix rj of low co-rank. We 
start with SNF on rj, switch to disk HNF when the fill-in of SNF forces us to do so, 
and rely on the low co-rank to make disk HNF succeed. The matrix j] is defined in 
Equation [3] below. 

In (|2.ip and (|2.2p we present these ideas more precisely. 

2.1. Computing the SNF. Let ^ by an m x rt matrix with entries in a field R 
where exact computation is possible. We define elementary matrices Pi G GLm{R) 
as usual [Jac85, p. 182]. These are permutation, translation, and dilation matrices, 
corresponding to the elementary operations listed above. Replacing A with PiA or 
Pj^^ A performs an elementary row operation on A. Multiplying on the right by an 
elementary matrix Qi e GL„(i?) performs an elementary column operation. 

2.1.1. The Markowitz algorithm. Markowitz [DER89, (7.2)] [HHR93] is a well- 
established approach to reducing fill-in. It is a greedy algorithm, reducing fill-in 
as much as possible at each step. Let a^- be a nonzero. Let be the number of 
nonzeros in a.^ 's row, and Cj the number in its column. If one adds a multiple of 
row i to row k in order to zero out the a^j entry, one creates up to — 1 new 
nonzeros in row k. Using row i to clear out the entire column j produces up to 
(ri — l)(cj — 1) new entries. The Markowitz algorithm, in its simplest form, chooses 
at each step the pivot Uij that minimizes the Markowitz count (r^ — l)(cj — 1). (If i? 
were Z, we would also need to address integer explosion, by avoiding pivots with 
large absolute value even if they have low Markowitz count.) 

It can be slow to compute the Markowitz count for all a^j. One approach is to 
look at only a few rows with small — say the smallest ten rows — and minimize the 
Markowitz count only for those rows. Early versions of the Sheafhom code used 
this approach. Currently, we prefer to avoid fill-in at whatever cost in speed, and 
we always search over all entries. To speed up the search, we store the and cj in 
arrays and update the arrays with every elementary operation. 

2.1.2. Statement of the algorithm. We now describe Sheafhom 2.2's SNF algo- 
rithm. Implementation details are deferred to (|2.3p . 

The main strength of the algorithm is the interplay between the Markowitz count 
and disk HNF when the co-rank is low. Outside these aspects, it is like many SNF 
algorithms [Jac85, (3.7)] [Coh93, (2.4.4)]. 

The algorithm uses an index c, the pivot index or corner, which just says the 
current pivot is at acc. The active region is where c < i < m and c < j < n. 
Outside the active region, the matrix has nonzeros on the diagonal and nowhere 
else. 

The parameter r controls when we switch from SNF to disk HNF. It is chosen 
by the user based on heuristics and experience; see (|2.3.6p for details. The part 
of the algorithm with r is stated when we are forgetting the Q's, as we always do 
for d'^; it would be easy to extend this part for the P's also. 

If one does not need to remember P and Q, one simply omits the part of the 
algorithm that writes them out. Our implementation overwrites A with D. 
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Input: ail m X n sparse matrix A = (aij) with entries in a field R. If we are 
not finding the Q change of basis matrices, we are given in addition a parameter r 
(defaulting to r = oo). 

Output: A finite list (Pqi Pi, P2, ■ ■ ■) of m x to elementary matrices, a finite list 
(• ■ • , Q2, Qi, Qo) oi n X n elementary matrices, and an to x n diagonal matrix D, 
such that 

A = {P0P1P2 ■■■)-D-i--- Q2QiQn) = PDQ 

as in Equation [2] 
Step 0. Set c = 0. 

Step 1. If the number of nonzeros in the active region is > r, and if disk HNF 
has not run yet, run disk HNF on the active region. That is, write the active region 
to disk, find its column-echelon form while reading it back one column at a time, 
and replace the old active region in memory with the echelon form, keeping track 
of the Q's. 

Step 2. If the active region is entirely zero, including the case c = min(m, n), 
then return the lists of P's and Q's, return A overwritten with D, and terminate 
the algorithm. 

Step 3. Choose a nonzero in the active region that minimizes the Markowitz 
count. This is the pivot. Use a row and column permutation if necessary to move 
the pivot to the Occ position (this is complete pivoting). If the row permutation is 
A — > P^^A^ then append Pi to the right side of the list of P's. (Of course P; = P;~^ 
for order-two permutations.) Similarly, append the column permutation Qi to the 
left side of the list of Q's. 

Step 4- For all j with c < j < n and acj 7^ 0, subtract a multiple of column c from 
column j to make acj = 0. For each of these elementary operations A AQ^^, 
append Qi to the left side of the hst of Q's. 

Step 5. For all i with c < i < m and a^c 7^ 0, subtract a multiple of row c 
from row i to make a^c = 0. For each of these elementary operations A Pf^A, 
append P/ to the right side of the list of P's. (If R were not a field, steps 4 and 5 
would need to be extended when aij/acc has a nonzero remainder for some in 
the active region.) 

Step 6. Increment c and go to step 1. 

2.1.3. Representing change- oj-basis matrices. It is important that we return P and 
Q as lists of elementary matrices. The products P0P1P2 ■ ■ ■ and • ■ • Q2Q1Q0 are 
likely to be dense; we could never hold them in RAM, much less compute their 
inverses. Fortunately, it is easy to work with them as lists. Given a matrix B, 
compute QB by multiplying B on the left by Qo, Qi, and so on. To compute 
Q~^B, run through the Qi in the opposite order, decreasing I, and multiply B on 
the left by Q^^ ■ Similar comments apply to the P;, and to transposes. 

The lists are still too big to fit in RAM, so we store them on disk. We read once 
through them every time we need to operate with P or Q. We use a text-based data 
format where each elementary matrix takes up only about 20 characters. Storing a 
translation matrix, for example, only requires storing the two indices and the value 
of the off-diagonal entry. Reading the elementary matrices in left-to-right order is 
straightforward. To read in right-to-lcft order, we use a pointer that steps backward 
through the file. 
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2.1 A. Sizes of the matrices. We will need more precise estimates for the rii = 
dimC". We refer the reader to [AGM02] for the definitions. The rii are approx- 
imated by sums of terms of the form \P^{'Z/NZ)\/\T„\. Here ^ P^{Z/NZ) 
is projective three-space over the ring Z/iVZ, and To- are the stabilizer groups of 
certain basis cells in the well-rounded retract W. 



V8 6 24 72/' ' 72' ' 
If has the prime factorization p^* , then 

11 p3 

When N is prime, this reduces to the famiUar formula |P'^| = 1 -|- A'' -I- + N^. 
In general, if we consider the set {pi} of primes dividing TV to be fixed, then |P'^| 
is a constant times A^'^, where the constant depends on the set. In the range of A^ 
we consider, up to the low 200's, the largest constant we encounter is 4.04, for 
A' = 2- 3- 5- 7 = 210. This factor of 4 is why we said that our estimates for the Ui 
are a few times larger for composite A^ than for prime A^. 

The « symbols arise because the orbits of in W are generically of size |ro-| 
but are occasionally smaller. What is more, we only count the orientable orbits. 
Experiments with A^ in the range 40 to 100 suggest that the error in the « is at 
worst 0.3% for prime A^ and 0.6% for composite A^. 

2.2. Computing cohomology. Let c?^ and c?^ be as in Formula [1] with rii — 
dimC". We now describe how we use Sheafhom to compute of the complex. 

First, compute the SNF = (^■)D^Q^ with P5 omitted. Let — rank D5 = 
rank . 

Second, define 

(3) 11 = Q^d\ 

Since d* is a cochain complex, the topmost /05 rows of 77 are zero. Delete these rows, 
still calling the matrix 77. Thus rj has dimension (77.5 — ps) x 774. 

Third, compute the SNF -q = PrjDrj{?), with Q?? omitted. Let = rank 13^ = 
rank 77. Note that rank c?'* = p,j, since c?^ and 77 differ only by a change of basis and 
deleting some zero rows. 

We can now report the Bctti number = dimH^: 

We need not only the Betti number, though, but an explicit list zi,...,Zh^ of 
cocycles in ker d^ that descend modulo im to a basis of the cohomology. Let B 
be the (775 — ^5) x /15 matrix with the identity in the bottom /15 x /15 block and zeroes 
elsewhere. Compute B = PjjB. Add to the top of B the p5 rows of zeroes that we 
deleted from 77, still calling it B. Then the columns of Q^^B are an appropriate 
list of cocycles Zj . 



COHOMOLOGY OF CONGRUENCE SUBGROUPS OF SLiiZ). Ill 



9 



Our Hecke operator algorithm takes a cocycle y and maps it to its Hecke trans- 
late, a cocycle y'. For simplicity, assume y = zj. The Hecke translate z'j is generally 
not in the span of the zj. Rather, z'^ = sijzi + ■ ■ ■ + Sh^jZh^ where the Sij G R 
are scalars and bj G im df^ is a coboundary. Computing the Sij and bj is straight- 
forward, since Qs, P,j, and the Zj arc all stored on disk. Ultimately, we express 
each Hecke operator as the /15 x /15 matrix (sij) with respect to our cohomology 
basis. 

2.2.1. Co-rank of r}. Using 77 may seem inelegant because it violates the symmetry 
of a cochain complex. Since the complex is 

Q ^ ^5 P4 -P4Q 4 ^4 ^ 

it is more elegant to compute using Q5 and P4, which are both 715 x 715 matrices. 

However, 77 has one great virtue: by removing the rows of zeroes from its top, we 
have dropped its co-rank down to . We observe that ^,5 is never more than 80 for 
the we consider, while ps could reach into the millions. This difference is what 
allows disk HNF to succeed. 

Let us give more precise estimates. The Betti number /ig = dim equals ng— 
since our chain complex has only O's after degree 6. We observe that Hq is never 
more than about 40 in our range of N. Thus p5 « /96 — 40. Estimating ^15 as 80, 
the rank p,, = p4 « - - 40) - 80. Both 40 and 80 are negligible 

for large TV, so « (1/10 - 1/96)N^ = A3/480N^. For rj, the co-rank is again 
about 80, meaning the number of entries in 77's echelon form is (co-rank) • (rank) 
w 80 ■ = 80 ■ (43/480)7V3 « 7N^. But the number of entries in d^' s echelon form 
is w (n5-p4)p4 ~ ((l/10)7V3-(43/480)iV3)-((43/480)7V3) (l/96)(43/480)Ar*^ w 
0.0009iV*^. At TV « 200, the latter is w TSOOTV^. In other words, the echelon form 
of has about 1000 times more entries than the echelon form of 77 when iV is 
near 200. 

This analysis was for Gaussian elimination, not SNF. When we compute the SNF 
of matrices of large co-rank, we observe the same behavior empirically. Figure [2] 
compares the fill-in for the SNF computations of d* and 77 at the same level = 53. 
Both matrices have 52766 columns and rank 13614. The example uses Markowitz 
only, not disk HNF. We show only the pivot indices c > 12000, since the graphs 
look the same to the left of that point. The fill-in for d"' is clearly the worse of the 
two, with a peak over three times higher than for 77. In general, the SNF algorithm 
displays "catastrophic cancellation" : the number of nonzcros in the active region 
tends to grow rapidly until almost the end of the algorithm, when the number 
decreases sharply to zero. Catastrophic cancellation begins for d'^ at row 13464 and 
for r] at about row 13084. 

The fill-in for our smaller matrix d^ is harder to predict. There are many columns 
with only one or two entries. These allow Markowitz to reduce the matrix with 
no fill-in at all, at least in the initial stages. Later, the number of nonzeros grows 
rapidly as for d^, with an even more precipitous cancellation at the end. Figure [3] 
gives the example for level N = 103. 

2.3. Implementation. Sheafhom 2.2 is written in Common Lisp, which is the 
ideal language for it in many ways. Common Lisp source code compiles to very 
efficient machine code; carefully- written Lisp is as fast as C-I--I- [GatOO]. Yet writing 
high-level code in Lisp is easier than in C or C-I-+. Like its inheritor Python, 
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Figure 2. Size of the active region during an SNF computation 
for and rj. Here M denotes million. 
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Figure 3. Size of the active region during an SNF computation for (f. 



Lisp has an outer real-eval-print loop that promotes high-level thinking and rapid 
development. Type safety is built in. Lisp's object-oriented package CLOS is the 
most flexible of any language's. Arbitrary-precision integers and rationals are part 
of the language standard and can be very fast. 

Sheafhom 2.1 and 2.2 were developed with Allegro Common Lisp (ACL) from 
Franz, Inc. Under Linux we use the free implementation Steel Bank Common Lisp 
(SBCL). Sheafhom 2.0 was developed in Java. Sheafhom 1.x was developed in 
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Carnegie-Mellon University Common Lisp (CMUCL), the parent of SBCL. All the 
Lisps mentioned here produce high-quality compiled code. 

Sheafhom 1.x was restricted to i? = Q, but had general support for the de- 
rived category of complexes of sheaves over cell complexes. It was also ported to 
Macaulay 2 [GS]. 

2.3.1. Overview of data structures. Any sparse matrix package uses data structures 
that store only the nonzeros. Sheafhom stores a sparse matrix as an array of sparse 
vectors representing the columns. A sparse vector is a linked list of nonzero sparse 
elements. A sparse element represents an ordered pair {i,v) with i E N, v E R. 
When this sparse element occurs in column j, it means the i,j entry of the matrix 
has nonzero value v. The rest of (|2.3p gives further details. 

2.3.2. Testing platforms. To explain the test data we will present later, we give some 
details about the two machines we use most. Portland, our Windows machine, is a 
laptop with a 1.30 GHz Celeron M processor and 992 MB of RAM, running ACL 8.1 
under Windows XP 2002. Athens, our Linux machine, is a 64-bit laptop with a 
2.20 GHz Intel Core2 Duo processor and 3.9 GB of RAM, running SBCL 1.0.12 
under Linux 2.6 (Ubuntu 8). 

2.3.3. Implementation of sparse elements. For general rings i?, SHEAFHOM imple- 
ments the indices i e N and values u G i? as objects. They arc allocated on the 
heap, and contain type information needed to use them in an object-oriented way. 
For these rings, we store (i, v) as a pair of pointers to i and to v. The pair of point- 
ers is a native data structure in Lisp, the cons, the fundamental building block for 
linked lists and tree structures. 

The cons implementation is convenient, but, as in all languages, the indirection 
carries a penalty in efficiency. Lisp may store the numbers on a different memory 
page from the conses themselves, forcing the machine to flip back and forth between 
pages. 

When R ~¥p, we implement sparse elements more efficiently. We typically use 
primes p a little less than 2^^, so that sums and products in [0,p) can be computed 
in a 32-bit integer before reducing mod p. Let k be the smallest integer such that 
p < 2^ . (For our favorite prime 12379, k = 14.) Each w £ Fp fits in fc bits. We 
would like to pack into a single 32-bit integer, but we cannot. With k ~ 14, 
say, there would only be 18 bits left for the index, and we need a larger range of 
indices than 2^^ « 10^. Therefore, on a 32-bit machine, we pack into an 

arbitrary-precision integer, but arrange things so the integer will fit into 32 bits at 
the critical stages of the computation. Let M be an upper bound on the number 
of rows in the matrix. We store (i, w) as the integer 

(4) (Af - i - 1) • 2^ + V. 

Near the end of a computation, when space is critical, i will be close to M. Hence 
M — i — 1 will be small and Formula [4] will fit in 32 bits. 
On a 64-bit machine, we store (i, v) even more simply as 

(5) i-2'' + V. 

For us, this never exceeds a 64-bit integer. When k = 14, for instance, it would 
exceed it only if z > 2^*"^'* = 2^" « 10^^, whereas our largest m are around 10^ 
or 10^. By declaring the sparse element to be a 64-bit integer throughout the 
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program, we avoid the separate heap aUocations for i and v, the pair of pointers 
holding i and v, and the object-oriented indirection. 

In all of Sheafhom's implementations of sparse elements, operations in the 
ring R really operate on the sparse elements. For instance, the sum of (ii, vi) and 
(«2, 1^2) is a new sparse element with value vi +V2 (z R and with some i determined 
by the context, typically i = ii. 

2.3.4. Implementation of sparse vectors and matrices. Sheafhom's sparse matrix 
data structure is an array of sparse vectors giving the columns, together with ar- 
rays of the row lengths and column lengths Cj used for Markowitz. Sheafhom 
implements a sparse vector as a singly linked list of sparse elements, sorted by in- 
creasing index i. The backbone of the linked list (as opposed to the sparse elements 
it holds) is again built from conscs. Implementing vectors as a linked list is flexible, 
compared to constantly allocating and reallocating arrays in C. Lists grow in size 
as fill-in occurs, and shrink as entries cancel each other out. As they shrink, the 
memory is reclaimed by the garbage collector. 

Sheafhom includes a macro with-splicer as a mini-language for surgery on 
lists, with-splicer iterates down a list and offers read, insert, modify and delete 
operations for individual elements, plus splicing commands to add and remove 
larger chunks. We allocate as few new backbone conses as possible. 

Implementing sparse vectors as singly-linked lists is flexible, as we have said, 
but it involves the usual risks because accessing a single element might take time 
linear in the length of the list. We can avoid this trap with a little care. To find 
the sum x -|- y of sparse vectors x and y, for example, we run through the vectors 
in parallel and accumulate the sum as we go. Similar comments apply to scalar 
multiplication, subtraction, and the dot product. 

One place we might encounter quadratic behavior is step 5 of the SNF algo- 
rithm (|2.1.2|) . If the pivot row and column have and Cj entries respectively, a 
direct approach would require (r^ — l){cj — 1) operations, each linear in the length 
of the column. The trick here is simply to put step 4 before step 5. Step 4 handles 
the column operations linearly as in the previous paragraph, and step 5 then has 
r^ = 1. 

Another likely place for slow behavior is multiplying a sparse matrix B on the 
left by an elementary matrix. This includes the important computation of rj (Equa- 
tion [3]) . An elementary operation with rows i and j might involve a linear sweep 
down ri + rj columns. We handle this situation by taking the transpose of B, 
multiplying on the right by the transposes of the elementary matrices, and taking 
a final transpose to put everything back at the end. The transpose of B takes time 
linear in the number of entries, which for a sparse matrix is small by definition. 

2.3.5. Comparison of sparse element implementations. Table [T] shows a test of 
speed and space requirements for the three implementations of sparse elements 
over F12379 on our Linux machine. We timed the SNF computation for the df^ ma- 
trix for level N = 53, the matrix of Figure [3 The matrix is 15218 x 52766. We 
used only the Markowitz portion of (|2.1.2p . no disk HNF. Since wc were on a 64-bit 
machine, each sparse element in Formula ([5]) takes 8 bytes. Formula (|4]) has about 
the same space requirement, especially towards the end of a computation when i is 
close to M. The {i,v) representation requires 8 bytes for each of i and v, plus 16 
bytes for the cons itself, for a total of 32 bytes. To all these figures we must add 
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16 bytes for the cons in the backbone of the sparse vector that holds the sparse 
element. 

2.3.6. Overall speed and space requirements. To summarize, our implementations of 
sparse elements are optimized for both space and speed, and our sparse vector and 
matrix algorithms avoid speed traps leading to quadratic time unnecessarily. On the 
other hand, at the higher layers of the algorithm, we sacrifice speed, minimizing fill- 
in at all costs. For instance, we mentioned in (|2.1.1[) that we do a full Markowitz 
scan at each pivot step. This takes about one third of the total time for the 
algorithm until we switch to disk HNF. 

The largest matrix we have handled so far has size 845, 712 x 3, 277, 686. This 
is the T] matrix for level = 211. It has close to 20 million nonzeros. We carried 
out the computation on our Linux machine, using the implementation of F12379 in 
FormulalU The sizes of d^ and d'^ are 98, 351 x 944, 046 and 944, 046 x 3, 277, 686, 
respectively. We reduced d^ using only the Markowitz portion of ()2.1.2p . with no 
disk HNF. We reduced t] using both Markowitz and disk HNF, switching when 
the active region had r equal to about 116 million nonzeros. Converting to rj as 
in ^ took significant time by itself, since it called for over three million elementary 
column operations on the transpose of d*. How the running time broke down is 
shown in Table [21 

It is interesting to compare our running times with those in [DEVGU07]. They 
compute for GLjCZ) at level = 1, while we do SL4(Z) at N = 211. The number 
of nonzeros is roughly comparable, 37 million versus 20 million. Both computations 
took about one month. They computed mod p for several p, but used 50 processors; 
we did one p on one processor. We find ourselves joking that 

GL7 = GL4 + 21l3. 

How the running time broke down is shown in Table [2] We do not distinguish 
between the wall-clock time and CPU time because they are essentially the same. 
We ran the computation on our Linux machine on a single processor. The machine 
was doing nothing besides the computation. Occasional tests with top showed 
the CPU running consistently at 100%. We presume one of the two cores ran 
the computation, and the other took care of background jobs like displaying the 
desktop. 

Recall that r is the maximum number of nonzeros allowed in the active region 
before switching from Markowitz to disk HNF. Table [3] shows the largest r we have 
used successfully. They depend on the chosen implementation of sparse elements, as 
well as on the operating system and version of Lisp. A + means we have relatively 
little data for this combination of parameters, and r could likely go higher than 
shown. Values without a -I- represent a reasonable maximum, determined by long 
experience and many out-of-memory crashes. The number of bytes is computed as 
in (|2.3.5p for our Linux machine. On our Windows machine, a 32-bit integer counts 
4 bytes, a cons 8 bytes. 

2.3.7. Other approaches. We mention a few more sparse matrix techniques that 
appear in the literature. 

Many scientific applications involve sparse matrices with a pattern, such as tridi- 
agonal or banded diagonal. The matrices in this paper have no recognizable sparsity 
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implementation 


total time 


space per entry 


(i, v) as a cons 
Formula |4] 
Formula [5] 


2768 sec 
1385 sec 
784 sec 


48 bytes 
24 bytes 
24 bytes 



Table 1. Comparison of sparse element implementations. 



SNF of 


2 day 


converting to t] 


5i days 


SNF of 77, Markowitz portion 


12 days 


SNF of ?7, disk HNF portion 


13 days 



Table 2. Overall speed for level = 211. 



machine and RAM 


implementation 


largest r 


space per entry 


total space 


Windows 1GB 


(i, v) as a cons 


30M 


24 


720MB 


Windows 1GB 




22M+ 


12 


264MB+ 


Linux 4GB 




42M+ 


24 


1.0GB+ 


Linux 4GB 


® 


148M 


24 


3.55GB 



Table 3. Maximum number of nonzcros allowed in the active region. 



pattern. A matrix coming from a d-dimensional topological space would have a pat- 
tern in d dimensions, but not when flattened into a two-dimensional table of rows 
and columns. 

The RSA challenge matrices described in [PS92] had some special properties. 
The columns on the left were very sparse, and could be used to clean out the 
somewhat denser columns on the right. Rows and columns with only two entries 
gave an even quicker reduction step [L091]. The [DEVGU07] matrices had many 
rows with only one entry [DEVGU07, 2.6.4], a result of cancellation at the low level 

= 1. By and large, our matrices do not have these properties. The sparsity is 
almost entirely uniform. The d^ have a substantial fraction of columns with one or 
two entries, but not d^. 

Block-triangularization is another well-established technique for sparse matrices. 
Given an m x n matrix A, we look for permutation matrices Pb and Qb so that 
B ~ PbAQb is block upper-triangular: it has square blocks down the diagonal 
and nothing but zeroes below the blocks. The matrix can then be reduced one 
block at a time, either to HNF or SNF. The upper-triangular part can be handled 
directly by back-solving. Since we only permute the matrix entries, there is no 
fill-in and no integer explosion. Assume for the moment that A is square and 
of full rank, and that after a row permutation the diagonal an is all nonzero. 
For such A, the block-triangular form is unique, and the way to find it is well 
known [DER89, Ch. 6]. When A is not of full rank, the block decomposition is 
generalized to the Dulmage-Mendclsohn decomposition, which is roughly upper- 
triangular [Dav06, (8.4)]. In our case, A is a fraction of a percent away from 
full rank and from having nonzeros down the diagonal; for square A of this type. 
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finding tlic Dulmage-Mendelsohn decomposition takes about the same time and 
space resources as block decomposition. So far tliese algoritlims are polynomial 
time. A new idea is needed, however, when A is not square but rectangular, as it 
is for us. One can find the Dulmage-Mendelsohn decomposition of the left-hand 
m X m square, then swap in columns from the wide section on the right in a way 
that shrinks the left-hand diagonal blocks. Deciding which columns to swap is in 
general an NP-hard problem. The third author has found good decompositions of 
some rectangular A using a heuristic for which columns to swap in. One iterates 
the procedure "do Dulmage-Mendelsohn, then swap in columns" many times. We 
defer the details to a future paper. 

3. ElSENSTEIN COHOMOLOGY AND PARAMODULAR FORMS 

In this section we provide the necessary background to state Conjectures [1] and 
[2] and explain our computational results in Section [4l 

3.1. Hecke eigenclasses and Galois representations. We will describe some 
classes appearing in H^(ro{N);C) in terms of the Galois representations conjec- 
turally attached to them. Thus we begin by recalling what this means [AGM08]. 

Let £, S H^{ro{N); C) be a Hecke eigenclass. In other words, ^ is an eigenvector 
for certain operators 

T{l,k): H^To{N);C) ^ H^ro{N);C), 

where k = 1,2,3 and I is a prime not dividing N. These operators correspond to 
the double cosets ro{N)D{l, k)rQ{N), where D{l,k) is the diagonal matrix with 
4 — fc ones followed by k Vs. (One can also define analogues of the operators Ui for 
I I N, although we do not consider them in this paper.) Suppose the eigenvalue of 
T{1, k) on ^ is a(Z, k). We define the Hecke polynomial H{^) of ^ by 

(6) H{i) ^ ^(-l)'^/'^('=-i)/2a(/, fc)T'^ G C[r]. 

k 

Now we consider the Galois side. Let Gal(Q/Q) be the absolute Galois group of 
Q. Let p: Gal(Q/Q) GL„(Qp) be a continuous semisimple Galois representation 
unramified outside pN . Fix an isomorphism ip: 'C Qp. Then we say the eigenclass 
^ is attached to p if for all / not dividing pN we have 

^(i7(e))=det(l-p(FrobOT), 

where Frob; C Gal(Q/Q) is the Frobcnius conjugacy class over /. Let e be the 
p-adic cyclotomic character, so that e(Frob() = I for any prime / coprimc to p. We 
denote the trivial representation by i. 

3.2. Eisenstein cohomology. Let X = SL4(M)/SO(4) be the global symmetric 
space, and let X^^ be the partial compactification constructed by Borel and Serre 
[BS73]. The quotient Y := ro{N)\X is an orbifold, and the quotient Y^^ := 
ro(A^)\X^® is a compact orbifold with corners. We have 

H*{ToiN)-X) ~ H*{Y; C) ~ H*(Y^^; C). 

Let dY^^ = Y^^ \ Y. The Hecke operators act on the cohomology of the bound- 
ary H* {dY^^ ; C) , and the inclusion of the boundary l: dY^^ Y^^ induces a map 
on cohomology t* : H*{Y^^;C) H*{dY^^; C) compatible with the Hecke action. 
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The kernel H*{Y^^;'C) of l* is called the interior cohomology; it equals the im- 
age of the cohomology with compact supports. The goal of Eisenstein cohomology 
is to use Eisenstein series and cohomology classes on the boundary to construct 
a Hecke-equivariant section s: H*{dY^^;C) _ff*(F^^;C) mapping onto a com- 
plement iJg;g(y^^;C) of the interior cohomology in the full cohomology. We call 
classes in the image of s Eisenstein classes. (In general, residues of Eisenstein se- 
ries can give interior, noncuspidal cohomology classes, with infinity type a Speh 
representation, but as noted in [AGM02], these do not contribute to degree 5.) 

3.3. Paramodular forms. We now give some background on Siegel modular forms. 
We will skip the basic definitions, which can be found in many places (e.g. [vdGOS] ) , 
and instead emphasize paramodular forms. Since our main application is cohomol- 
ogy of subgroups of SL4(Z), we will focus on Sp4. 

Let K{N) be the subgroup of Sp4(Q) consisting of all matrices of the form 

/ Z iVZ z z \ 

Z Z Z N-^Z 

Z iVZ z z 

\ Nl NZ NZ Z j 

The group K{N) is called the paramodular group. It contains as a subgroup the 
standard congruence subgroup T'q{N) modeled on the Klingen parabolic; that is, 
rQ(A^) C Sp4(Z) is the intersection K{N) D Sp4(Z). Paramodular forms are Siegel 
modular forms that are modular with respect to K{N). Clearly such forms are 
also modular with respect to r'Q{N), although modular forms on the latter are not 
necessarily paramodular. Note also that in the embedding i: Sp4(Z) SL4(Z), we 
have iiT'aiN)) = i{Spi{Z)) n SL4(Z). 

The paramodular forms of interest to us are those of prime level N and weight 
3. We denote the complex vector space of such forms by P^lN). One can show that 
P3{N) consists entirely of cuspforms, i.e. there are no weight 3 paramodular Eisen- 
stein series. Recently T. Ibukiayama [Ibu07] proved a general dimension formula 
for P3(7V): 

Theorem 1. [Ibu07, Theorem 2.1] Let N be prime and let K{a) be the Kronecker 
symbol (■^). Define functions f,g: Z ^ Q by 

{2/5 i/iV = 2,3mod5, 
1/5 */7V = 5, 
otherwise. 



and 



1/6 i/iV = 5modl2, 
otherwise. 



We have dimP3(2) = dimP3(3) = 0. For N >b, we have 

dimPaiN) = {N^ - l)/2880 

+ {N + 1)(1 - K(-l))/6i+5{N - 1)(1 + k(-1))/192 
+ {N + 1)(1 - k(-3))/72 +{N-1){1 + k(-3))/36 
-f (l-«(2))/8 + /(7V)+5(7V)-l. 



COHOMOLOGY OF CONGRUENCE SUBGROUPS OF SLiiZ). Ill 17 

For any N, the space of weight k paramodular forms contains a distinguished 
subspace (N) originally constructed by Gritsenko [Gri95] . This space is defined 
by a lift from the space J^^^ of cuspidal Jacobi forms of weight k and index TV 
to P3{N). We will not define Jacobi forms here, and instead refer the reader to 
[EZ80] for background. For our purposes, we will only need to know the dimension 
dimP3^(Af) = dim Jg"^^. Formulas for the dimensions of spaces of Jacobi forms 
can be found in [EZ80, pp. 121, 131-132]; the following reformulation is due to 
N. Skoruppa: 

Theorem 2. We have 

m-l 2 

(7) dimJ™7=^.(fc + 2j-l)-[^ , 

where s(k) is the dimension of the space of cuspidal elliptic modular forms of full 
level and weight k. 

Let Pf'^{N) be the Hecke complement in PsiN) of the subspace P^iN) of 
Gritsenko lifts. The dimension of this space is easily determined by Theorems [1] 
and [21 

We conclude our discussion of paramodular forms by defining the Hecke poly- 
nomial attached to an eigenform. More details can be found in [PY09]. Let / be a 
prime not dividing N. Then associated to I there are two Hecke operators T; and 
Ti2. For an eigenform h G P3{N) we denote the corresponding eigenvalues by Si, 
dp: 

Tih = 5ih, Ti'ih = 5i2h. 
We define the Hecke polynomial attached to h by 

(8) i?sp(/i) = l-5iT+ {6f - Sp - f)T^ - SiPr^ + fT^. 

This polynomial is essentially the local factor at I attached to the spinor L-function 
for h. 

4. Conjectures and computational results 

In this section we present two conjectures on the structure of 7J^(ro(A^); C) for N 
prime, and conclude by describing our computational evidence for them. 

4.1. Notation. We begin by fixing notation for the different constituents of the 
cohomology. 

• Weight two holomorphic modular forms: Let (T2 be the Galois representa- 
tion attached to a holomorphic weight 2 newform / of level N with trivial 
Nebentypus. Let a be the eigenvalue of the classical Hecke operator Ti on 
/. Let Ha(f72) and IIb(cr2) be the Galois representations in the first two 
rows of Table [5] (see p. EH). 

• Weight four holomorphic modular forms: Let be the Galois representa- 
tion attached to a holomorphic weight 4 newform / of level N with trivial 
Nebentypus. Let f3 be the eigenvalue of the classical Hecke operator Ti on 
/. Let IV((T4) be the Galois representation in the third row of Table [5] 

• Cuspidal cohomology classes from subgroups of SL3 (Z) .• Let r be the Ga- 
lois representation conjecturally attached to a pair of nonselfdual cuspidal 
cohomology classes 77,77' G H^(T^{N);C), where C SL3(Z) is the 
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congruence subgroup with bottom row congruent to (0, 0, *) modulo N. 
Let 7 be the eigenvalue of the Hecke operator Ti^i on 77, and let 7' be its 
complex conjugate. Let Illa(r) and Illb(r) be the Galois representations 
in the last two rows of Tabic [5] 

If / is a weight 2 or weight 4 cigcnform as above, or a weight 3 paramodular 
eigenform, we denote by d/ the degree of the extension of Q generated by the Hecke 
eigenvalues of /. Wc say that two eigenforms /, g are Galois conjugate if there is an 
automorphism a G Gal(Q/Q) such that the Hecke eigenvalues of / are taken into 
those of g by a. Wc say /, g are equivalent if g is a Q-linear combination of / and 
its Galois conjugates. We extend these notions in the obvious way to eigcnclasscs 
€ H^T*{Ny,C). 

For any modular form of weight k with Fourier expansion f(z) = a„e^'^™^, 
let L{s,f) be the Dirichlet series X^n*^"/""- series L{s,f) can be completed 
to a function A(s, /) satisfying a functional equation of the shape s ^ k — s. 

4.2. Eisenstein cohomology. 

Conjecture 1. Let N be prime. Then the cohomology group H^{Tq(N);C) con- 
tains the following Eisenstein subspaces: 

(1) For each equivalence class of weight two holomorphic newforms of level N , 
choose a representative f with associated Galois representation fT2- Then 
there are two df -dimensional subspaces in the cohomology, one attached to 
the Galois representation IIa{<72), and the other to the Galois representation 
nb{a2). 

(2) For each equivalence class of weight four holomorphic newforms of level N , 
choose a representative f with associated Galois representation 174. Then if 
the central special value A(2, /) vanishes, there is a df -dimensional subspace 
in the cohomology attached to the Galois representation IVia^). 

(3) For each equivalence class of nonselfdual cuspidal cohomology classes in 
H^{Tq(ji);C), Tq{p) C SL3(Z), choose a representative rj and let r be 
the conjecturally associated Galois representation. Then there are two d^- 
dimensional subspaces, one attached to the Galois representation IIIa{T), 
and the other to the Galois representation Illbir). 

Furthermore, for N prime this is a complete description of the Eisenstein subspace 
ofH^iToiN);C). 

In our earlier paper [AGM08], we also gave a conjecture about some Eisenstein 
subspaces of H^. In fact, for weight 2 modular forms and for SL3-cuspidal cohomol- 
ogy, there is no difference between [AGM08, Conjecture 1] and the conjecture here. 
The new part is in the contribution of the weight 4 modular forms. In [AGM08], our 
data was only sufficient to suggest that the weight 4 forms / appearing were those 
whose completed L-functions A(s, /) have a minus sign in their functional equa- 
tions. Certainly this contains the subspace of forms whose central special value 
vanishes, but there are additional forms that also contribute (cf. Example [1] below). 

Because of our extensive computations, we feel confident that Conjecture [1] com- 
pletely describes the Eisenstein subspace for prime level. However, Conjecture [1] is 
not true for composite N, as already remarked in the paragraph after [AGM08, Ex- 
ample 1]. 
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4.3. Paramodular forms. 

Conjecture 2. For N prime, choose an equivalence class of eigenforms in Pj^'''~^{N), 
and let h be a representative. Let dh he the degree of the extension of Q generated 
by the eigenvalues ofh. Then the cuspidal cohomology H^^^p{Tq{N); C) contains a 
2dh- dimensional subspace spanned by Hecke eigenclasses. If ^ is an eigenclass in 
this space, then up to Galois conjugacy the Hecke polynomial -ff (^) of ^ from (j6]) 
agrees with the Hecke polynomial Hsp{h) of h from (|8|). 

We remark that this cquaUty means that ^ is the functorial lift of h with respect 
to the natural inclusion of L-groups: ^GSp4 ^GL4. 

4.4. Computational results. These are listed in Table 21 which shows our com- 
puted Bctti numbers and the dimensions of the constituents of the cohomology 
predicted by Conjectures [T] and [2l For levels < 101, we cheeked that the Hecke 
polynomial for I = 2 is correct. 

Example 1. We consider the case N = 127. There are two weight 2 eigenforms, 
with Hecke eigenvalues defining respectively a cubic and a septic field. There are 
three weight 4 eigenforms, with Hecke eigenvalues defining fields of degrees 1, 13, 
and 17. The degree 13 eigenform has minus sign in the functional equation of its 
i-function, which means its central special value vanishes. However, there is also 
another vanishing at this level: the rational eigenform also has vanishing central 
special value, vanishing that is not forced by the sign of the functional equation. 
Thus together these modular forms account for a 2 x 10 + 14 = 34 dimensional 
subspace of H%To{N);C). 

For the rest of the cohomology, we must consider SL3 and paramodular contri- 
butions. There is no cuspidal cohomology for ro(127) C SL3(Z). The space of non- 
Gritsenko lifts has dimension 3. Thus we see an additional 6-dimensional subspace 
of coming from these Siegel modular forms, which means dimif^(ro(127); C) > 
40. Indeed our computations indicate that this Bctti number equals 40. 
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Table 4. Betti numbers and constituents of Conjectures [T] and [H 
The entries are the dimensions of the spaces in the headings, which 
are as follows: (i) S2{N) denotes weight 2 cuspidal modular forms 
of level N and trivial character, (ii) S4,{N)o denotes weight 4 mod- 
ular forms of level N, trivial character, and with vanishing cen- 
tral special value, (iii) SL3 denotes the cuspidal cohomology of the 
congruence subgroup Tq{N) C SL3(Z), and (iv) P^^{N) denotes 
weight 3 paramodular forms of level N that are not Gritsenko lifts. 
In all cases 2 x (second + fourth + fifth) + third equals the dimension 
of H^iTo{N);C). 
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